Handling Nested Parallelism and Extreme Load Imbalance in an Orbital Analysis Code

نویسندگان

  • Benjamin Gaska
  • Neha Jothi
  • Mahdi Soltan Mohammadi
  • Kat Volk
  • Michelle Mills Strout
چکیده

Nested parallelism exists in scientific codes that are searching multi-dimensional spaces. However, implementations of nested parallelism often have overhead and load balance issues. The Orbital Analysis code we present exhibits a sparse search space, significant load imbalances, and stopping when the first solution is reached. All these aspects of the algorithm exacerbate the problem of using nested parallelism effectively. In this paper, we present an inspector/executor strategy for chunking such computations into parallel wavefronts. The presented shared memory parallelization is no longer nested and exhibits significantly less load imbalance. We evaluate this approach on an Orbital analysis code, and we improve the execution time from the original implementation by an order of magnitude. As part of a Graduate Computer Science course in Parallel Programming models, we show how the approach can be implemented in parallel Perl, Python, Chapel, Pthreads, and OpenMP. Future work includes investigating how to automate and generalize the parallelization approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory

The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resultin...

متن کامل

A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP

In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in the Intel Fortran Compiler: scheduling algorithms, chunk sizes and nested parallelism. We found that the lo...

متن کامل

Compile-Time Partitioning of Three-Dimensional Iteration Spaces

This paper presents a strategy for compile-time partitioning of generalised three-dimensional iteration spaces; it can be applied to loop nests comprising two inner nested loops both of which have bounds linearly dependent on the index of the outermost parallel loop. The strategy is analysed using symbolic analysis techniques for enumerating loop iterations which can provide estimates for the l...

متن کامل

Task-Based Execution of Nested OpenMP Loops

In this work we propose a novel technique to reduce the overheads related to nested parallel loops in OpenMP programs. In particular we show that in many cases it is possible to replace the code of a nested parallel-for loop with equivalent code that creates tasks instead of threads, thereby limiting parallelism levels while allowing more opportunities for runtime load balancing. In addition we...

متن کامل

An Efficient Implementation of Nested Data Parallelism for Irregular Divide-and-Conquer Algorithms

This paper presents work in progress on a new method of implementing irregular divide-and-conquer algorithms in a nested data-parallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change from data-parallel to control-parallel behavior over the lifetime of an algorithm, switching from ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.09668  شماره 

صفحات  -

تاریخ انتشار 2017